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Abstract —  We  consider  the  problem  of  quickest  localization 
of  anomaly  in  a  resource-constrained  cyber  network  consisting 
of  multiple  components.  Due  to  resource  constraints,  only  one 
component  can  be  probed  at  each  time.  The  observations  are 
random  realizations  drawn  from  two  different  distributions 
depending  on  whether  the  component  is  normal  or  anomalous. 
Components  are  assigned  priorities.  Components  with  higher 
priorities  in  an  abnormal  state  should  be  fixed  before  components 
with  lower  priorities  to  reduce  the  overall  damage  to  the  network. 
We  formulate  the  problem  as  a  priority-based  constrained  op¬ 
timization  problem.  The  objective  is  to  minimize  the  expected 
weighted  sum  of  completion  times  of  abnormal  components 
subject  to  error  probability  constraints.  We  then  propose  a 
two-stage  optimization  formulation  to  solve  the  problem.  First, 
we  consider  the  independent  model,  where  each  component  is 
abnormal  independent  of  other  components.  Next,  we  consider 
the  exclusive  model,  where  one  only  one  component  is  abnormal. 
We  develop  optimal  index  policies  under  both  models.  Optimal 
low-complexity  algorithms  are  derived  for  the  simple  hypotheses 
case,  where  the  distribution  is  completely  known  under  both 
hypotheses.  Asymptotically  (as  the  error  probability  approaches 
zero)  optimal  low-complexity  algorithms  are  derived  for  the 
composite  hypotheses  case,  where  there  is  uncertainty  in  the 
distribution  parameters.  Simulation  results  then  illustrate  the 
performance  of  the  algorithms. 

Index  Terms — Anomaly  detection.  Intrusion  Detection  Sys¬ 
tem  (IDS),  sequential  hypothesis  testing,  detection  under  un¬ 
certainty. 

I.  Introduction 

An  intrusion  detection  system  (IDS)  is  a  system  that  moni¬ 
tors  the  network  to  detect  malicious  activities  (i.e.,  attacks) 
in  the  network.  Once  an  IDS  determines  that  a  malicious 
activity  has  occurred,  it  then  alerts  the  security  administrator 
or  initiates  a  proper  response  to  the  malicious  activity.  Good 
surveys  of  IDSs  can  be  found  in  [1],  [2],  Here,  we  focus 
on  anomaly  detection,  where  statistical  methods  are  used  to 
detect  deviations  from  normal  operation.  Quickest  detection 
of  anomaly  subject  to  reliability  constraints  is  an  important 
requirement  when  designing  intrusion  detection  schemes.  The 
sooner  an  IDS  detects  malicious  activities,  the  lower  the 
resulting  damage  to  the  network.  Related  works  of  existing 
techniques  for  anomaly  detection  can  be  found  in  [3]— [  16] . 

In  this  paper  we  address  the  problem  of  quickest  localization 
of  anomaly  in  a  resource-constrained  cyber  network.  We  con¬ 
sider  a  network  with  K  heterogeneous  components  which  can 
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be  paths,  routers,  or  subnets.  Assume  that  an  intrusion  has  been 
detected.  The  goal  here  is  to  locate  the  infected  components  as 
quickly  and  as  reliably  as  possible.  Most  of  existing  studies  on 
anomaly  detection  do  not  consider  the  constraint  on  the  system 
monitoring  capacity.  Here,  we  focus  on  a  resource-constrained 
intrusion  detection  in  cyber  networks,  as  was  done  in  [151- 
El?] .  Due  to  resource  constraints,  only  one  component  can  be 
probed  at  each  time.  The  observations  are  random  realizations 
drawn  from  two  different  distributions  depending  on  whether 
the  component  is  normal  or  anomalous.  The  completion  time 
of  component  k  is  defined  as  the  time  where  the  IDS  completes 
testing  component  k.  Components  are  assigned  priorities. 
Components  with  higher  priorities  in  an  abnormal  state  should 
be  fixed  before  those  with  lower  priorities  to  reduce  the  overall 
damage  to  the  network. 

Throughout  this  paper  we  use  the  theory  of  sequential 
detection.  In  sequential  tests,  after  each  observation  has  been 
collected,  the  detector  decides  whether  to  accept  Ho,  reject 
Ho  or  to  take  another  observation.  The  sample  size  achieved 
by  sequential  tests  can  be  significantly  reduced  as  compared 
to  fixed-size  tests.  Therefore,  it  is  a  natural  approach  for 
quickest  localization  of  anomaly.  Sequential  detection  has 
been  extensively  studied  in  the  literature.  In  cases  where 
the  measurements  can  be  collected  sequentially  according  to 
a  specific  order,  the  number  of  measurements  required  for 
optimal  detection  can  be  significantly  reduced.  Related  works 
on  this  subject  can  be  found  in  [  1 8]— [2 1  ] .  However,  this 
is  not  the  case  in  the  IDS  model.  Change -point  detection 
theory  can  be  applied  to  the  problem  of  anomaly  detection 
to  identify  a  change  in  the  probability  distribution  when  a 
malicious  activity  occurs.  Related  works  on  this  subject  can 
be  found  in  [8]— [10].  However,  in  this  paper  we  consider  a 
different  problem.  Here,  an  intrusion  has  been  detected  (by 
probing  subnet,  for  instance  [15]).  The  goal  here  is  to  locate 
the  infected  components.  During  the  anomaly  localization,  all 
the  observations  are  drawn  from  two  different  distributions 
depending  on  whether  the  component  is  normal  or  anomalous. 
The  problem  of  sequentially  testing  the  simple  null  hypothesis 
Hq  versus  the  simple  alternative  hypothesis  H\  was  solved  in 
[22],  [23],  It  was  shown  that  the  Sequential  Probability  Ratio 
Test  (SPRT)  minimizes  the  expected  sample  size  under  given 
type  I  and  type  II  error  probability  constraints.  Related  works 
on  SPRT-based  solutions  for  anomaly  detection  can  be  found 
in  [3],  [5],  [6],  [13],  [14],  Various  problems  of  sequentially 
testing  the  composite  null  hypothesis  Ho  versus  the  composite 
alternative  hypothesis  Hi  were  studied  in  [24]-[30].  In  this 
case,  asymptotically  optimal  performance  can  be  obtained  as 
the  error  probability  approaches  zero. 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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In  the  following,  we  summarize  the  main  results  of  this 
paper.  We  formulate  the  anomaly  localization  problem  as  a 
priority-based  constrained  optimization  problem.  The  objective 
is  to  minimize  the  expected  weighted  sum  of  completion  times 
of  abnormal  components  (since  normal  components  do  not 
cause  damage  to  the  network)  subject  to  error  probability  con¬ 
straints.  Minimizing  the  weighted  sum  of  completion  times  is 
a  natural  criterion  to  prioritize  the  completion  of  high-priority 
components  [31].  The  optimization  is  done  over  the  set  of  all 
possible  selection  rules  (that  the  IDS  uses  to  decide  which 
component  to  test  at  each  time),  stopping  rules  (that  the  IDS 
uses  to  decide  when  to  stop  testing  each  component)  and  deci¬ 
sion  rules  (that  the  IDS  uses  to  make  a  decision  regarding  the 
state  of  each  component).  We  then  convert  the  original  opti¬ 
mization  problem  to  a  two-stage  optimization  problem.  The  the 
two-stage  formulation  allows  us  to  simplify  the  computation 
of  the  original  optimization  problem  by  decomposing  it  into 
two  subproblems.  We  consider  both  independent  and  exclusive 
models.  In  the  former,  each  component  is  abnormal,  with  some 
prior  probability,  independent  of  other  components.  Under  the 
exclusive  model,  one  and  only  one  component  is  abnormal 
with  some  prior  probability  (which  is  a  reasonable  model 
when  the  probability  of  each  component  to  be  compromised  is 
small).  We  develop  index  policies  under  both  models.  Optimal 
algorithms  are  derived  for  the  simple  hypotheses  case,  where 
the  distribution  is  completely  known  under  both  hypotheses. 
However,  in  numerous  cases  under  the  adversary  model,  there 
is  uncertainty  in  the  observation  distribution  (in  particular 
when  the  component  is  in  an  abnormal  state).  Therefore,  we 
extend  our  results  to  the  case  of  composite  hypotheses,  where 
there  is  uncertainty  in  the  distribution  parameters.  For  this 
case,  asymptotically  (as  the  error  probability  approaches  zero) 
optimal  algorithms  are  derived.  In  all  cases,  the  algorithms 
have  low-complexity. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section 
II  we  describe  the  network  model  and  problem  formulation. 
In  Section  III  we  present  the  two-stage  optimization  formu¬ 
lation.  In  Sections  IV,  V  we  derive  optimal  low-complexity 
algorithms  under  the  independent  and  exclusive  models  for  the 
simple  hypotheses  case,  respectively.  In  Section  VI  we  extend 
our  results  to  derive  asymptotically  optimal  low-complexity 
algorithms  under  the  independent  and  exclusive  models  for  the 
composite  hypotheses  case.  In  Section  VII  we  provide  appli¬ 
cations  and  numerical  examples  to  illustrate  the  performance 
of  the  algorithms. 

II.  Network  Model  and  Problem  Formulation 

Consider  a  cyber  network  consisting  of  K  components. 
Assume  that  an  intrusion  has  been  detected.  The  goal  here  is 
to  locate  the  infected  components.  Due  to  resource  constraint, 
only  one  component  can  be  probed  at  each  time.  When 
component  k  is  tested,  a  sequence  of  i.i.d.  measurements 
{yk{i)}i>i  is  drawn  in  a  one-at-a-time  manner.  If  component 
k  is  in  a  healthy  state,  {t/fc(*)}i>1  are  drawn  from  distribution 
,  if  component  k  is  abnormal,  {j/fc(i)}i>:L  are  drawn  from 
distribution  We  define 

y  k(n)  =  {j/fe(n)}"=1  (1) 


as  the  vector  of  observations  for  the  n  samples  that  have  been 
collected  from  component  k. 

Components  are  assigned  priorities.  Let  Wk  (0  <  Wk  <  oo) 
be  the  priority  (or  weight)  of  component  k.  Components  with 
higher  priorities  in  an  abnormal  state  should  be  fixed  before 
components  with  lower  priorities  to  reduce  the  overall  damage 
to  the  network. 

We  consider  the  case  where  the  switching  cost  is  high.  Thus, 
switching  between  components  is  done  only  when  testing 
the  current  component  is  completed.  The  advantages  of  this 
scheme  are  twofold.  First,  switching  between  components 
typically  adds  significant  delay  that  should  be  avoided.  Sec¬ 
ond,  the  IDS  is  required  to  store  observations  of  only  one 
component  at  each  time.  Thus,  this  scheme  is  applicable  to 
limited-memory  systems.  For  convenience,  we  define  tm  as 
the  time  where  the  IDS  has  completed  the  (to  —  l)th  test 
and  starts  the  mth  test.  After  each  observation  has  been 
collected,  the  IDS  needs  to  decide  whether  to  take  more 
measurements  from  the  current  component  or  finalize  the  test 
on  the  current  component  by  declaring  its  state  (healthy  or 
abnormal)  and  choose  the  next  component  to  test.  Let  7 Tk(tm) 
be  the  probability  (i.e.,  belief)  that  component  k  is  abnormal 
at  time  trn.  Let  1  k(tm)  be  the  testing  indicator  function, 
where  1  k(tm)  =  1  if  component  k  is  tested  at  time  tm  and 
1  k(tm)  =  0  otherwise. 

Let  Nk  be  the  random  sample  size  required  to  make  a 
decision  regarding  the  state  of  component  k.  Let  Ck  be  the 
random  completion  time  of  testing  component  k.  For  example, 
if  the  IDS  tests  component  1  followed  by  component  2,  then 

C\  =  N±  and  C2  =  Ari  +  N2. 

Let  Tk  be  a  stopping  rule,  which  the  IDS  uses  to  decide 
whether  to  take  more  measurements  from  component  k  or  to 
finalize  the  test  by  declaring  its  state.  Let  r  =  (tt,  ...,  Tk)  be 
the  vector  of  stopping  rules  for  the  K  components. 

Let  5k  £  {0, 1}  be  a  decision  rule,  where  =  0  if  the  IDS 
declares  that  component  k  is  in  a  healthy  state  (i.e.,  Ho),  and 
5k  =  1  if  the  IDS  declares  that  component  k  is  in  an  abnormal 
state  (i.e..  Hi).  Let  6  =  (Si, ...,  5k)  be  the  vector  of  decision 
rules  for  the  K  components. 

Let  4>(tm)  £  {1,2,  ...,K}  be  a  selection  rule,  indicates 
which  component  is  chosen  to  be  tested  at  time  trn.  Let 
4>  =  (<j>(ti), ...,  (f>(tK ))  be  the  vector  of  selection  rules  for 
the  K  components. 

Let 

Hi  =  {k  :  1  <  k  <  K  ,  component  k  is  abnormal}  , 

Ho  =  {k  :  1  <  k  <  K  ,  component  k  is  healthy}  , 

be  the  sets  of  all  the  abnormal  and  healthy  components, 
respectively. 

The  problem  is  to  find  a  selection  rule  cf>,  a  stopping  rule  t 
and  a  decision  rule  5  that  minimize  the  expected  weighted  sum 
of  completion  times  of  all  the  abnormal  components  subject 
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to  error  probability  constraints  for  each  component: 


inf 

T  ,5,(f> 

s.t. 


E 


P[A  <  ock 
p£ID  <  fa 


Mk  =  1  , 

Vfc  =  1  . 


(2) 


Higher  penalties  are  assigned  to  higher-priority  components  in 
an  abnormal  state  1 .  No  penalty  is  associated  with  components 
in  a  healthy  state  since  they  do  not  cause  damage  to  the  net¬ 
work.  Note  that  the  policy  (</>,  r.  6)  is  dynamic.  At  each  time, 
the  IDS  needs  to  decide  whether  to  take  more  measurements 
from  component  k  or  to  finalize  the  test  by  declaring  its  state 
and  select  the  next  component. 

Throughout  this  paper  we  develop  optimal  and  asymptot¬ 
ically  optimal  algorithms  to  solve  (2)  under  the  simple  and 
composite  hypotheses  cases,  respectively.  The  algorithms  de¬ 
veloped  throughout  this  paper  can  be  applied  to  other  network 
models  as  well.  We  discuss  these  extensions  in  Section  VIII. 


III.  Two-stage  Optimization  Problem 


Instead  of  solving  (2)  directly,  we  propose  a  two-stage 
optimization  problem.  At  the  first  stage,  the  problem  is  to  find 
a  stopping  rule  rk  and  a  decision  rule  Sk  for  every  component 
k  that  minimize  the  expected  sample  size  given  Hi  subject  to 
error  probability  constraints: 

ME(Nk\Hi),  i  =  0,l 

Tk,(*k 

s.t.  P[A  <  ak  ,  (3) 

P£ID  <  fa  ■ 


For  the  simple  hypotheses  case,  the  solution  to  the  first-stage 
optimization  problem  (3)  is  given  by  the  SPRT  [22],  [23], 
Let 


Lk{n)  = 


n?=i 


(4) 


be  the  Likelihood  Ratio  (LR)  between  the  two  hypotheses  of 
component  k  at  stage  n. 

Let  Ak,Bk  ( Bk  >  1/Ak)  be  the  boundary  values  used  by 
the  SPRT  for  component  k,  such  that  the  error  constraints  are 
satisfied2.  According  to  the  SPRT  algorithm,  at  each  stage  n, 
the  LR  is  compared  to  the  boundary  values  as  follows: 


•  If  Lk(n)  £  ((Afc)-1,  Bk^,  continue  to  take  observations 
from  component  k. 

•  If  Lfe(n)  >  Bk.  stop  taking  observations  from  component 
k  and  declare  it  as  abnormal  (i.e..  Hi).  Clearly,  Nk  =  n. 

•  If  Lk(n)  <  (Ak)~l,  stop  taking  observations  from 
component  k  and  declare  it  as  normal  (i.e.,  I  lit).  Clearly, 

Nk  =  n. 


Remark  1:  Implementing  sequential  tests  requires  to  compute 
boundary  values  to  determine  the  stopping  rule,  such  that  error 

'Note  that  the  loss  due  to  missed-detection  events  is  negligible  for  small 
error  probability,  since  P]fD  S  0(1/ Bk)  and  E(Nk)  S  ©(logBj.),  where 
Bk  is  a  boundary  value  of  the  sequential  test  [23],  [27], 

2 We  discuss  the  determination  of  the  boundary  values  Ak,  Bk  in  Remark 


constraints  are  satisfied.  In  general,  the  exact  determination 
of  the  boundary  values  is  very  laborious  and  depends  on  the 
observation  distribution.  However,  since  the  solution  to  (3)  is 
given  by  the  SPRT,  Wald’s  approximation  can  be  applied  to 
simplify  the  computation  [23]: 


Bk 


1-Pk 


Oik 


1  ~  ak 
fa 


(5) 


Wald’s  approximation  performs  well  for  small  ak,fik.  Since 
type  I  and  type  II  errors  are  typically  small,  Wald’s  approx¬ 
imation  is  widely  in  practice  [23], 

For  the  composite  hypotheses  case,  where  there  is  uncertain¬ 
ty  in  the  distribution  parameters,  we  can  obtain  asymptotically 
optimal  solution  to  (3).  This  case  is  discussed  in  Section  VI. 

At  the  second  stage,  the  problem  is  to  find  a  selection  rule 
cf)  that  minimizes  the  objective  function,  given  the  solution  to 
the  K  subproblems  (3): 


inf  E  <  ^2  WkCk  \ 

^  Ue«i  J 

s.t.  solutions  to  (3)  are  given  for  k  =  1, ...,  K  . 


The  solutions  to  the  second-stage  optimization  problem  for 
the  independent  and  exclusive  models  are  given  in  Sections 
IV  and  V,  respectively. 

The  formulation  of  the  two-stage  optimization  problem 
allows  us  to  decompose  the  original  optimization  problem  (2) 
into  K  +  1  subproblems  (3)  and  (6).  We  use  this  formulation 
to  design  the  solution  to  (2).  In  subsequent  sections  we  show 
that  for  the  simple  hypotheses  case  the  solution  to  the  two- 
stage  optimization  problem  solves  the  original  optimization 
problem  (2)  under  both  independent  and  exclusive  models.  For 
the  composite  hypotheses  case,  the  solution  to  the  two-stage 
optimization  problem  asymptotically  (as  the  error  probability 
approaches  zero)  solves  the  original  optimization  problem 
under  both  independent  and  exclusive  models. 


IV.  The  independent  Model  Case 

In  this  section  we  consider  the  independent  model  under  the 
simple  hypotheses  case.  Under  the  independent  model,  each 
component  is  abnormal  independent  of  other  components.  The 
posterior  probability  of  component  k  being  abnormal  can  be 
updated  at  time  tm+i  as  follows: 

Ek  {tm+l )  = 

_ ik(tm)Ek(tm)fk\yk(Nk)) _ 

Kk(tm)fk\yk(Nk))  +  (1  -  7 Tk(tm))  fl°\yk(Nk)) 

+  (i-i  k{t  m  ))  7 Tk(t  m )  • 

(7) 

In  the  following  we  derive  optimal  low-complexity  algorithm 
for  this  case. 


A.  The  Proposed  Solution 

We  use  the  two-stage  optimization  problem  to  design  the 
solution  to  (2).  For  the  simple  hypotheses  case,  the  solution  to 
the  first-stage  optimization  problem  (3)  is  given  by  the  SPRT, 


1. 


4 


discussed  in  section  III.  Thus,  here  we  focus  on  the  solution 
to  the  second-stage  optimization  problem  (6). 

It  was  shown  in  [31]  that  the  optimal  selection  rule  for 
the  problem  of  minimizing  the  expected  weighted  sum  of 
completion  times  given  the  expected  testing  time  of  each 
component  is  to  select  the  components  in  decreasing  order 
of  wk/E(Nk).  However,  the  problem  in  (6)  is  different.  First, 
the  objective  is  to  minimize  the  expected  weighted  sum  of 
completion  times  of  abnormal  components  only.  Second,  the 
expected  sample  size  depends  on  the  component  state.  In  what 
follows  we  derive  a  modified  optimal  selection  rule  that  solves 
the  second-stage  optimization  problem  (6). 

Theorem  1:  Let  E(Nk)  be  the  solution  to  (3).  A  selection- 
rule  <p*  that  selects  the  components  in  decreasing  order  of 
ttk(ti)wk/ E(Nk)  solves  the  second-stage  optimization  prob¬ 
lem  (6). 

Proof:  The  theorem  follows  from  the  proof  of  Theorem  2.  ■ 
Remark  2;  The  solution  to  the  second-stage  optimization 
problem  (6)  requires  one  to  compute  the  expected  sample  size 
E(Nk)  for  all  k  =  1,2 to  select  the  components  in 
decreasing  order  of  TTk(ti)wk/ E(Nk).  In  general,  it  is  difficult 
to  obtain  a  closed-form  expression  to  E(Nk).  However,  since 
the  solution  to  (3)  is  given  by  the  SPRT,  Wald’s  approximation 
can  be  applied  to  simplify  the  computation  [23],  For  every 
i,j  =  0,1,  let 

rw-n  a  V  (i  fkl)(yk(l))\ 

be  the  Kullback-Leibler  (KL)  divergence  between  the  hypothe¬ 
ses  II,  and  Hj ,  where  the  expectation  is  taken  with  respect  to 

f  (0 

Jk  ■ 

The  expected  sample  size  is  well  approximated  by  [23]: 


TABLE  I 

Algorithm  1  for  the  independent  model 


1.  arrange  the  components  in  decreasing 
order  of  nk(t1)wk/E(Nk) 

2.  for  k  =  1, ...,  K  components  do: 

3.  perform  SPRT  for  component 
with  P[A  <  ak,  PfcMD  <  /ife 

4.  end  for 


probability  to  be  abnormal  nk(ti)  increases,  or  the  expected 
sample  size  E(Nk)  decreases  (since  E(Nk)  is  added  to  the 
completion  time  of  every  component  which  is  tested  after 
component  k).  The  SPRT  is  used  to  minimize  the  expected 
sample  size  to  reduce  the  completion  times. 

B.  Optimality  of  Algorithm  1 

In  this  section  we  provide  performance  analysis  of  Algo¬ 
rithm  1.  Note  that  Algorithm  1  uses  a  static  selection  rule  (as 
stated  in  step  1),  where  the  components  order  is  predetermined 
at  time  t\.  However,  the  performance  analysis  in  this  section  is 
not  restricted  to  static  selection  rules.  The  following  theorem 
shows  that  Algorithm  1  is  optimal  among  the  class  of  both 
static  and  dynamic  selection  rules  (that  update  the  selection 
dynamically  at  each  time  tk). 

Theorem  2:  Under  the  independent  model,  Algorithm  1 
solves  (2). 


E(Nk\H0) 
E(Nk  \Hk) 


(1  -  ak)  log  Ak  -  ak  log  Bk 
Dk[ 0||1) 

(1  -  fffc)  log  Bk  -  /3k  log  Ak 

Dk{  1||0) 


(9) 


where  Ak  =  (1  -  ak)/fik,Bk  =  (1  -  /3k)/ak  are  the 

approximation  to  Ak,Bk,  given  in  (5). 

Thus,  at  each  time  f,  the  expected  sample  size  required  to 
make  a  decision  regarding  the  state  of  component  k  is  given 
by: 


E(Nk)  =  nkWEiNklHJ  +  (1  -  nk(t))E(Nk\H0)  , 

(10) 

where  the  approximation  approaches  the  exact  expected  sam¬ 
ple  size  for  small  ak,/3k.  Since  type  I  and  type  II  errors 
are  typically  small,  Wald’s  approximation  is  widely  used  in 
practice.  [23], 

Based  on  the  solution  to  the  two-stage  optimization  prob¬ 
lem,  we  propose  Algorithm  1,  presented  in  Table  I,  to  solve 
(2).  Sorting  the  components  in  step  1  can  be  done  in  0(k  log  k) 
time  via  sorting  algorithms.  Then,  a  series  of  SPRTs  is 
performed  according  to  this  order  until  all  the  components  are 
tested.  The  index  policy  described  in  Algorithm  1  is  intuitively 
satisfying.  The  priority  of  component  k  in  terms  of  testing 
order  should  be  higher  as  the  weight  wk  increases,  or  the 


Proof:  Let  E' {Nk\H,X)  be  the  expected  sample  size  achieved 
by  a  stopping  rule  and  a  decision  rule  (r'k(t),  dk(t)),  depending 
on  the  time  that  component  k  is  tested  (i.e.,  {Tk(t),S'k(t)) 
depend  on  the  selection  rule),  such  that  error  constraints 
are  satisfied.  Let  EA1(Nk\Hi)  be  the  expected  sample  size 
achieved  by  the  SPRT’s  stopping  rule  and  decision  rule 
(ta1,Sa1),  independent  on  the  time  that  component  k  is  tested 
(i.e.,  (ta\6a' )  are  independent  on  the  selection  rule),  such 
that  error  constraints  are  satisfied.  Clearly,  EA1(Nk\Hi)  < 
E'(Nk\Hi,  t)  for  all  k,t ,  for  i  =  0, 1  and  are  achieved  by 
Algorithm  1. 

First,  consider  the  case  where  K  =  2.  Assume  that 

TTl(tl)Wl  7r2(fl)u>2 

EAl{N1)  ^  EA1(N2)  ' 

Consider  selection  rules  (}>  1  -1,  </j :  2 :  that  select  component  1 
first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  weighted  sum  of 
completion  times  achieved  by  (t'(£),  S'(t),  is  given  by: 


I  (T/(i),5/(t),0(2))j 

=  (E' (N2\H1,t1))  Tr2{ti)w2 

+  {E,(N2\h)+E'(N1\Hllt2))Tt1{t1)w1. 


(11) 
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The  expected  weighted  sum  of  completion  times  achieved  by 
(' T,(t),S'(t),(j)il) )  is  given  by: 


E  \  '52wkCkl{ke'H1}  I  ( \ 

U=t  J  (12) 

=  (E'(Ni\Hi,  ti))  7ti  (ti)wi 

+  (E'(Ni\ti)  +  E'(N2\Hi,t2))  TT2(tl)W2. 


Note  that  the  expected  weighted  sum  of  completion  times 
achieved  by  both  selection  rules  can  be  further  reduced 
by  minimizing  the  expected  sample  sizes  (such  that  error 
constraints  are  satisfied)  independent  on  the  selection  rules, 
which  achieved  by  (rA1,  5A1).  Therefore,  an  optimal  solution 
has  to  be  (ta1,5a1,<J)^)  or  (ta1,6A1,4>^).  Next,  we  use 
the  interchange  argument  to  prove  the  theorem  for  K  =  2. 
The  expected  weighted  sum  of  completion  times  achieved  by 
(r‘41,  <5a1, 0(2^)  is  given  by: 


=  (EA1(N2\H1))  7T2(fl)W2 

+  (EA1{N2)  +  EA1(Ni\Hi)) 


(13) 


The  expected  weighted  sum  of  completion  times  achieved  by 
(t‘41,  SA1,  is  given  by: 

E  jx>fcCU{fe6«l}  I  (ta\<5a1,0(1M 

U=t  J  (14) 

=  (EA1(Nl\Hl))  TTl(tl)wi 

+  (EA1(Ni)  +  EA1(N2 lift))  7 T2(h)w2. 


The  expected  weighted  sum  of  completion  times  achieved  by 
(()  1 '  is  lower  than  the  expected  weighted  sum  of  completion 
times  achieved  by  cf)1'2'1  since  that  •  which 

completes  the  proof  for  K  =  2. 

Next,  we  prove  the  theorem  by  induction  on  the  number  of 
components  K.  Assume  that  the  theorem  is  true  for  K  —  1 
components.  Assume  that 


7tl(fl)Wl  7T2(fl  )W2  TTK(tl)wK 

EA1{Ni)  -  EA1(N2)  -  "■  -  EA1(NK) ' 

Consider  an  optimal  selection  rule  (jyJ>  that  selects  component 
j  first.  Due  to  the  independency  between  components,  it 
can  be  verified  by  the  induction  hypothesis  that  the  last 
K  —  1  components  have  to  be  selected  in  decreasing  order 
of  7Tk(ti)wk / EA1  (./Vfc)  and  tested  by  the  SPRT.  Hence,  the 
expected  weighted  sum  of  completion  times  achieved  by 


(r'(f),  (f)^)  is  given  by: 

E  l'52wkCkl{ke-Hi}  I  (■ ) 


Kk= 1 

=  nj(ti)wj  ( E'(NJ\H1,t1 )) 

K 

+  E  [' Kk(tl)wkX 
k=l,k^j 

k- 1 

E'{N3  |fi) 


E  EA1(Ni) 


EA1  (Nk\Hi) 


(15) 

First,  note  that  the  expected  weighted  sum  of  completion  times 
achieved  by  can  be  further  reduced  for  all 

j  by  minimizing  the  expected  sample  size  E'(Nj\Hi,t\)  for 
i  =  0,1,  which  achieved  by  (ta1, 5A1).  Therefore,  an  optimal 
solution  has  to  be  (rA1,  <SA1, 0^)  for  an  optimal  selection 
rule  <ffi\  Thus,  in  the  following  we  consider  solutions  of  the 
form  (t*41,  <5A1,  <fi). 

Next,  by  contradiction,  consider  an  optimal  selection  rule 
(jjat  selects  component  j  ^  1  first.  Therefore, 
selects  the  components  by  the  following  order: 

j,  l,j  +  1, ..., K. 

As  a  result,  the  expected  weighted  sum  of  completion  times 
achieved  by  (ta1  ,  SA1 ,  cf)^^)  is  given  by: 

E{E^f^l}  |  (rA1,<*A1,<^1>)j 

=  nj(ti)wj  ( EA1(Nj\H1 )) 

+Mh)w1  [EAl  (Nj)  +  EA1  (N^H,)} 

K 

+  E  [nk(ti)wkX 
k=2,k^j 

k-1 

EA1  (Nj)  +  (  E  EA1  (• Ni )  j  +  EA1  (Nk\Hi) 

(16) 

We  use  the  interchange  argument  to  prove  the  theorem. 
Consider  a  selection  rule  (j)' 1 '  that  selects  component  1  first 
followed  by  components  j,  2, 3,  j  —  1,  j  +  1, ...,  K.  Similar  to 
(16),  the  expected  weighted  sum  of  completion  times  achieved 
by  (ta\<5a1,0(1))  is  given  by: 

E{x><*i{**o  |  (TA1,<5A1,<E>)j 

=  (^A1((Vi|ifi)) 

[EA1  (JVi)  +  EA1  (Nj\Hi)] 

K 

+  E  lnk(tl)wkX 

k=2  ,k^j 

/  k-1 

EA1  (Nj)  +  (  E  eA1  W)  I  +  eA1  (Nk\Hi) 

/ 

By  comparing  (16)  and  (17),  it  can  be  verified  that: 


(17) 


WkCkl{ke'Hi}  I  (' 


rA1,SA1,cp^ 


TABLE  II 

Algorithm  2  for  the  exclusive  model 


k 


<  E  <  y^mfcCfclf 


fc£«l) 


/_A1  cAl 
(r  , d  ,  0 


0¥i)'l 


„fc=i 


since  that  7Ti(ti)«;i/.Elj41(./Vi)  >  irj(ti)wj/EA1(Nj)  . 

The  expected  weighted  sum  of  completion  times  can  be 
reduced  by  selecting  component  1  first  followed  by  component 
j,  which  contradicts  the  optimality  of  (p^^.  Hence,  at  time 
t\  selecting  component  1  minimizes  the  expected  weighted 
sum  of  completion  times.  By  the  induction  hypothesis,  for  the 
last  K-l  components  we  select  the  components  in  decreasing 
order  of  ttk(ti)wk/ EA1{Nk ),  which  completes  the  proof.  ■ 

V.  The  Exclusive  Model  Case 

In  this  section  we  consider  the  exclusive  model  under  the 
simple  hypotheses  case.  Under  the  exclusive  model,  one  and 
only  one  component  is  abnormal.  The  posterior  probability 
of  component  k  being  abnormal  is  updated  at  time  Lm+i  as 
given  in  (18)  on  the  next  page.  It  is  easy  to  see  that  under 
the  exclusive  model,  we  have  X^fcL=i7rfc(^)  =  1-  Note  that 
in  contrast  to  the  independent  model,  under  the  exclusive 
model  the  beliefs  of  all  the  components  are  changed  at  each 
time  due  to  the  dependency  across  components.  The  posterior 
probabilities  depend  on  the  selection  rule  and  the  collected 
measurements.  Nevertheless,  in  what  follows  we  propose  an 
optimal  low-complexity  algorithm  to  solve  (2)  based  on  the 
two-stage  optimization  problem  (3),  (6).  In  section  V-B  we 
provide  an  optimality  analysis. 

A.  The  Proposed  Solution 

We  use  the  two-stage  optimization  problem  to  design  the 
solution  to  (2).  For  the  simple  hypotheses  case,  the  solution  to 
the  first-stage  optimization  problem  (3)  is  given  by  the  SPRT, 
discussed  in  section  III.  Thus,  here  we  focus  on  the  solution 
to  the  second-stage  optimization  problem  (6).  In  section  IV-A, 
we  showed  that  selecting  the  components  in  decreasing  order 
of  Ttk{t\)wk/ E{Nk)  solves  (6)  under  the  independent  model. 
In  the  following  we  show  that  a  different  selection  rule  solves 
(6)  under  the  exclusive  model. 

Theorem  3:  Let  E(Nk\Hi),  i  =  0, 1  be  the  solution  to  (3). 
A  selection  rule  (p*  that  selects  the  components  in  decreasing 
order  of  nk(ti)wk/E(Nk\Ho)  solves  the  second-stage  opti¬ 
mization  problem  (6). 

Proof:  The  theorem  follows  from  the  proof  of  Theorem  4.  ■ 
Based  on  the  solution  to  the  two-stage  optimization  problem, 
we  propose  Algorithm  2,  presented  in  Table  II,  to  solve 
(2).  The  index  policy  described  in  Algorithm  2  is  intuitively 
satisfying.  The  priority  of  component  k  in  terms  of  testing 
order  should  be  higher  as  the  weight  Wk  increases,  or  the 
probability  to  be  abnormal  nk(ti)  increases,  or  the  expected 
sample  size  E(Nk\Ho)  decreases.  Note  that  in  contrast  to  the 
independent  model,  here  we  take  into  account  the  expected 
sample  size  under  H0  solely.  The  reason  is  that  if  component 
k  is  abnormal,  there  is  no  penalty  to  other  components  under 
the  exclusive  model  (since  only  one  component  is  abnormal). 


1. 

arrange  the  components  in  decreasing 

order  of  nk{t1)wk/E(Nk\H0) 

2. 

for  k  =  1, ...,  K  components  do: 

3. 

perform  SPRT  for  component 

with  P[A  <  ak,  PAJD  <  pk 

4. 

end  for 

On  the  other  hand,  if  component  k  is  healthy,  then  /f  (A).  |  //0) 
is  added  to  the  completion  time  of  the  components  which  are 
tested  after  component  k  (and  may  be  abnormal).  The  SPRT 
is  used  to  minimize  the  expected  sample  size  to  reduce  the 
completion  times. 

B.  Optimality  of  Algorithm  2 

In  this  section  we  provide  performance  analysis  of  Algo¬ 
rithm  2.  Note  that  Algorithm  2  uses  a  static  selection  rule  (as 
stated  in  step  1),  where  the  components  order  is  predetermined 
at  time  t\.  However,  the  performance  analysis  in  this  section  is 
not  restricted  to  static  selection  rules.  The  following  theorem 
shows  that  Algorithm  2  is  optimal  among  the  class  of  both 
static  and  dynamic  selection  rules  (that  update  the  selection 
dynamically  at  each  time  tk)- 

Theorem  4:  Under  the  exclusive  model.  Algorithm  2  solves 

(2). 

Proof:  Let  E'  (Nk\P,-X)  be  the  expected  sample  size  achieved 
by  a  stopping  rule  and  a  decision  rule  (r'k(t),  S'k(t)),  depending 
on  the  time  that  component  k  is  tested  (i.e.,  (rk(t),  6'k(t)) 
depend  on  the  selection  rule),  such  that  error  constraints 
are  satisfied.  Let  EA2(Nk\Hi )  be  the  expected  sample  size 
achieved  by  the  SPRT’s  stopping  rule  and  decision  rule 
{tA2,SA2),  independent  on  the  time  that  component  k  is  tested 
(i.e.,  (tA2,SA2)  are  independent  on  the  selection  rule),  such 
that  error  constraints  are  satisfied.  Clearly,  EA2(Nk\Hi)  < 
E'(Nk\Hi,t)  for  all  k,t,  for  i  =  0, 1. 

First  consider  the  case  where  K  =  2.  Assume  that 

Tti{ti)wi  n2(ti)w2 

EA2(Ni\H0)  ~  EA2(N2\H0)  ‘ 


(19) 


Consider  selection  rules  \  <fj(2  that  select  component  1 
first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  weighted  sum  of 
completion  times  achieved  by  (r'(<),  S'(t),  (p^)  is  given  by: 

eJe^1^}  i 


=  (• E'(N2\Hi,t\))iT2(ti)w2 

+  (E'(N2\H0,t1)  +  E'(N1\H1,t2))7r1(t1)w1. 


(20) 
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_ (yk(Nk)) _ 

nk(tm)fk\yk(Nk))  +  (1  -  7 Tk(tm))  fk\yk(Nk )) 

^  (1  -  lfe(tm))  7rfc(im)/^]m)(y0(tm)(iV0(tm))) 

7r^(tm)(^rn)/0(]m)(y0(tm)(-W0(tm)))  +  (l  _  7r0(tm)(^m))  /0(tm)(y0(tm)(-^r0(tm))) 


The  expected  weighted  sum  of  completion  times  achieved  by 
(T'(t),S'(t),  0(1))  is  given  by: 


K 


E  \  ^2wkCkl{k£'H1}  I  (T/(f),5'(f),0(1)) 


E  {  £  I  (tA2,  <SA2,  0(2)) 


k  fc— 1 


=  (£A2(lV2|.ffi))  7r2(fi)tn2 

+  (£A2(JV2|ff0)  +  ^(TVilf/O)  7r1(f1)u;1. 


K 


E^ttffcCfclffcew,}  I  (tA2,<5A2,0«) 


<  k— 1 


=  {EA2(N\  |ifi))  Tn^Outi 

+  (^(^l^o)  +  £A2(iV2|ffi))  7r2(ii)u>2. 


7ri(fi)tni  7t2(fi)tn2 


^(iVilfTo)  -  £A2(7V2|7T0: 


>  ...  > 


nK{ti)wK 


jK0J  f?\y 3  (Nj)) 


Note  that  when  the  IDS  completes  to  test  component  j,  the 
other  components  update  their  beliefs  according  to: 


t tfc(t2)  =  7j(fi)7rfe(fr)  ,  Vfc  ^  j  . 


(26) 


U=i  I 

=  (^'(fVi|iTi,fi))7ri(fi)u)i 

+  (E/(N1\H0,t1)  +  E'(N2\H1,t2))TT2(t1)w2. 

(21) 

Note  that  the  expected  weighted  sum  of  completion  times 
achieved  by  both  selection  rules  can  be  further  reduced 
by  minimizing  the  expected  sample  sizes  (such  that  error 
constraints  are  satisfied)  independent  on  the  selection  rules, 
which  achieved  by  (rA2,  5k2).  Therefore,  an  optimal  solution 
has  to  be  (rA2,  SA2,  4>^)  or  (rA2,  <SA2, t//2-*).  Next,  we  use 
the  interchange  argument  to  prove  the  theorem  for  K  =  2. 
The  expected  weighted  sum  of  completion  times  achieved  by 
(tA2,  <5A2, 0(2^)  is  given  by: 

f  K  \ 


The  expected  weighted  sum  of  completion  times  achieved  by 
given  the  outcome  (at  time  f2)  by  testing  component  j 
(i.e.,  given  the  observations  vector  y j(Nj))  is  given  by: 

E  I  <£0)-y  iWoj 

=  Kii^WjNj 

+  (1-t rj(f2))x 

E  |  22  wkCk l{fc£-H1}  I  <t>{o\yANj)ij  e  Ko 

I  fc=l  ,kjij 


Let 


Ck  =  Ck  —  Nj 


(27) 


(28) 


(22) 


be  the  modified  completion  time,  defined  as  the  completion 
time  from  t  =  Nj  +  l  until  testing  component  k  is  completed. 
Thus,  we  can  rewrite  (27)  as: 


The  expected  weighted  sum  of  completion  times  achieved  by 
(rA2,  <iA2, 0(1-*)  is  given  by: 


K 


E  <  ^WfcC'fc1{fceH1}  I  4>U\yj{Nj) 


(23) 


The  expected  weighted  sum  of  completion  times  achieved 
by  <f>^  is  lower  than  the  expected  weighted  sum  of  com¬ 
pletion  times  achieved  by  since  that  — —  + 
P  ^  V  EA2(Ni\H0)  ~ 

7T2(fi)u)2 

.  "  ,  which  completes  the  proof  for  I\  =  2. 

EAZ(N2\Ha) 

Next,  we  prove  the  theorem  by  induction  on  the  number  of 
components  K .  Assume  that  the  theorem  is  true  for  K  —  1 
components  (where  one  and  only  one  component  is  abnormal). 
Assume  that 


EA2(NK\H0)  ' 

(24) 

Consider  an  optimal  selection  rule  (j)  n  that  selects  component 
j  first. 

Let  i 

7 j(t)  =  - T7iT7  -  •  (25) 


U=i  ) 

K 

=  22  Kk{tl)wkNj 
k=l 

+  (1-7 Tj(t2))X 

E  |  22  u’kCkl{keni}  I  4>{j\y j(Nj),j  e  Ho 

[  fc=l  ,kjij 

The  term  7Tk(t2)wkNj  in  (29)  follows  since, 

Pr  (k  G  Hi  |  </>0),y G  H<2) 

Pr  (k  G  Hi,j  G  Ho  |  <t>{3\y j((Vj'),) 
Pr(jG7fo|0W),yi(iVi),) 
_Pr(fcG7f1|^,y,(AT,),)  _  ^ 


(29) 


Pr  (j  G  Hq  |  #,:.y,!.V;).)  l-Tjfa) 


=  7 Tk(t2)  ■ 


(30) 


Minimizing 


K 


7Tj(f) 


+  1-7 Tj  (t ) 


E  E  wkCkl{keni}  |  <t)^\yj(N3 


(31) 


Kk=l 
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at  time  i2,  requires  to  minimize 


K 


E  l  ^  wkCkl{keHl}  |  ^\y3{Nj),j  €  Ho  }  (32) 

in  (29). 

Note  that  (32)  is  the  expected  weighted  sum  of  completion 
times  for  K  —  1  components  (where  one  and  only  one 
component  is  abnormal)  starting  at  time  t  =  <2  =  Nj  +  1, 
with  prior  probability  7Tfc(f2)  =  ,  \  f°r  component 

k  7^  j  to  be  abnormal.  By  the  induction  hypothesis,  for  any 
optimal  selection  rule  that  selects  component  j  first, 
arranging  the  last  K  —  1  components  with  decreasing  order 
of  TTk{t2)wk/EA2(Nk\Ho)  (and  testing  them  by  the  SPRT) 
minimizes  (32). 

Since 


7tfc(*2)  =  -1  7j^12  ^k(ti)  Vfc  ^  j, 
1  -  ttj  (r2 ) 


(33) 


then 


7ri(*2)«ti  >  n2(t2)w2  >  >  7ri_1(f2)Wj-i 


£P“(JVi|i?o)  -  E^2(iV2|if0)^-  - 

>  7Tj  +  l(f2)Wj  +  l  >  >  TTK(t2)WK 


EA2(Nj+i \H0) 


EA2(NK\H0)' 


(34) 

Thus,  the  last  K  —  1  components  have  to  be  selected  in 
decreasing  order  of  nk(ti)wk/EA2(Nk\Ho)  and  tested  by  the 
SPRT. 

Hence,  the  expected  weighted  sum  of  completion  times 
achieved  by  (r'(t),  S'(t),  <f>^)  is  given  by: 

E  jX^WfeCfeiifceWi}  |  (T'(i),<S'(f),</>(j))  j 

=  Kj{ti)wj  C E'(Nj\H i,fi)) 

K 

+  5Z  [7rfc(ii)wfcx 

k=l,k^j 


k- 1 


E,(iVJ|iTo,f1)+  53  (A^|^0) 

+EA2(Nk |Ft))]  . 

(35) 

First,  note  that  the  expected  weighted  sum  of  completion  times 
achieved  by  (r' (f),  6' (t),  <f>^)  can  be  further  reduced  for  all 
j  by  minimizing  the  expected  sample  size  E’  (Nj\H;,  ti)  for 
i  =  0,1,  which  achieved  by  ( tA2,  SA2).  Therefore,  an  optimal 
solution  has  to  be  (tA2,  SA2,  for  an  optimal  selection 
rule  .  Thus,  in  the  following  we  consider  solutions  of  the 
form  (tA2,  SA2,  (f>). 

Next,  by  contradiction,  consider  an  optimal  selection  rule 
that  selects  component  j  ^  1  first.  Therefore,  (j)^^ 
selects  the  components  by  the  following  order: 

J)  1)  2, ...,  j  —  1,  j  +  1, ...,  K. 

As  a  result,  the  expected  weighted  sum  of  completion  times 


achieved  by  (tA2,  SA2,  is  given  by: 

E  j X>fcCfcl{ke*l}  I  (rA2,SA2,c/>^) | 

=  *j(t i)wj  (EA\Nj\Ih)) 

+n1(t1)w1  [ Ea 2  (Nj\H0)  +  EA2  (iVi|fTi)] 

K 

+  5Z  i^k^WkX 

k—2,k^j 

(k- 1 

53  EA2  (Ni\H0) 

+EA2  (NklH,))]  . 

(36) 

We  use  the  interchange  argument  to  prove  the  theorem. 
Consider  a  selection  rule  <pf  i  1  that  selects  component  1  first 
followed  by  components  j,  2, 3,  j  —  1,  j  +  1, ...,  K.  Similar  to 
(36),  the  expected  weighted  sum  of  completion  times  achieved 
by  (t-42,<T42,0(1))  is  given  by: 

E|E^%e«l}  I  (rA2,<5A2,0(1))j 

=  Mti)w!  (T;A2(Wi|fFi)) 

+nJ(t1)wj  [EA2  {N^Ho)  +  EA2  (. Nj\H ,)] 

K 

+  5Z  lnk(h  )^fcx 

k—2,k^j 

(k-1 

53  E^iNilHo) 

+EA2  (NklH,))]  . 

(37) 

By  comparing  (36)  and  (37),  it  can  be  verified  that: 

E  53  wkCk^-{keH\}  I  (tA2,<*A2,0«)  j 

<E(53«;fcCfcl{fcewl}  I  (rA2,<5A2i(i#1))] 


.fc=l 


.fc=l 


since  that 


7Tl(fl)wi  7Tj(fl)Wj 


EA2(Ni \H0)  ~  EA2(Nj\H0)  ■ 

The  expected  weighted  sum  of  completion  times  can  be 
reduced  by  selecting  component  1  first  followed  by  compo¬ 
nent  j,  which  contradicts  the  optimality  of  Hence, 

at  time  t\  selecting  component  1  minimizes  the  expected 
weighted  sum  of  completion  times.  We  have  already  proved 
that  selecting  the  last  K  —  1  components  in  decreasing  order 
of  nk(ti)wk/ EA2(Nk\Ho)  minimizes  the  objective  function, 
which  completes  the  proof.  ■ 

VI.  Localization  of  Anomaly  Under  Uncertainty 
In  the  previous  sections,  we  focused  on  the  simple  hy¬ 
potheses  case,  where  the  distribution  under  both  hypotheses 
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are  completely  known.  For  this  case,  the  SPRT  was  applied 
in  Algorithms  1,2  to  solve  (3).  However,  in  numerous  cases 
under  the  adversary  model,  there  is  uncertainty  in  the  obser¬ 
vation  distribution  (in  particular  when  the  component  is  in 
an  abnormal  state).  Therefore,  in  this  section  we  extend  our 
results  to  the  case  of  composite  hypotheses,  where  there  is 
uncertainty  in  the  distribution  parameters. 

Let  0k  be  a  vector  of  unknown  parameters  of  component 
k.  The  observations  {2/fc(*)}*>1  are  drawn  from  a  common 
distribution  f  (y\Oh),  Ok  £  Bfc,  where  Qk  is  the  parameters 
space  of  component  k.  If  component  k  is  in  a  healthy  state, 
then  Ok  €  ©j^;  if  component  k  is  abnormal,  then  Ok  € 

(0\©iO))o 

Let  ©[°\  ©j^  be  disjoint  subsets  of  0*,,  where  Ik  = 
0\(©i°)U0fe1))  ^  0  is  an  indifference  region3.  When  Ok  £  Ik , 
the  detector  is  indifferent  regarding  the  state  of  component  />::. 
Hence,  there  are  no  constraints  on  the  error  probabilities  for 
all  Ok  £  Ik-  The  hypothesis  testing  regarding  component  k  is 
to  test 

0k  £  ©j^  against  0k  £  ©j^. 

Narrowing  /,,  has  the  price  of  increasing  the  sample  size. 

Let 

0k{n)  =arg  max  f  (yk(n)\6k), 

Hi)  (38) 

0k  (n)  =  arg  max  /(yfe(n)|0fc), 

ekee^> 

be  the  Maximum-Likelihood  Estimates  (MLEs)  of  the  parame- 

(i) 

ters  over  the  parameters  space  Qk,  0).  at  stage  n,  respectively. 

In  contrast  to  the  SPRT  (for  the  simple  hypotheses  case), 
the  theory  of  sequential  tests  of  composite  hypotheses  does 
not  provide  optimal  performance  in  terms  of  minimizing  the 
expected  sample  size  under  given  error  constraints.  Neverthe¬ 
less,  asymptotically  optimal  performance  can  be  obtained  as 
the  error  probability  approaches  zero. 

First,  we  provide  an  overview  of  existing  sequential  tests 
for  composite  hypotheses  which  are  relevant  to  our  problem. 
Next,  we  apply  these  techniques  to  solve  (2). 


A.  Existing  Sequential  Tests  for  Composite  Hypothesis  Testing 

The  key  idea  of  sequential  tests  of  composite  hypotheses, 
discussed  in  this  section,  is  to  use  the  estimated  parameters 
to  perform  a  one-sided  sequential  test  to  reject  H0  and  a  one¬ 
sided  sequential  test  to  reject  Hi.  Note  that  these  techniques 
were  introduced  for  a  single  process.  However,  in  this  paper 
we  apply  sequential  tests  for  K  components.  Thus,  we  use  the 
subscript  k  to  denote  the  component  index. 

1)  Sequential  Generalized  Likelihood  Ratio  Test  (SGLRT): 
We  refer  to  sequential  tests  that  use  the  Generalized  Likelihood 
Ratio  (GLR)  statistics  [32]  as  the  SGLRT. 

For  i  =  0, 1,  let 


T(i),GLR 

Lk 


(n)  =  log 


U:=1.f{yk(r)\Ok(n)) 

nr=i/(yfc(r)i0fc  V)) 


(39) 


3  The  assumption  of  an  indifference  region  is  widely  used  in  the  theory  of 
sequential  testing  of  composite  hypotheses  to  derive  asymptotically  optimal 
performance.  Nevertheless,  in  some  cases  this  assumption  can  be  removed. 
For  more  details,  the  reader  is  referred  to  [27]. 


be  the  GLR  statistics  used  to  reject  hypothesis  Hi  at  stage  n. 
Let 

iV«  =  inf  {  n  :  L^’GLR{n)  >  }  ,  (40) 

be  the  stopping  rule  used  to  reject  hypothesis  Ht .  Bk  is  the 
boundary  value. 

For  each  component  k,  the  IDS  stops  sampling  when  Nk  = 
min  | N^"1 ,  Nkl -1 1 .  If  Nk  =  component  k  is  declared 

as  abnormal  (i.e.,  Hq  is  rejected).  If  Nk  =  component 

k  is  declared  as  normal  (i.e.,  11,  is  accepted). 

The  SGLRT  was  first  studied  by  Schwartz  [24]  for  a  one- 
parametric  exponential  family,  who  assigned  a  cost  of  c  for 
each  observation  and  a  loss  function  for  wrong  decision.  It  was 
shown  that  setting  Bk  1  =  log(cjT1)  asymptotically  minimizes 
the  Bayes  risk  as  ck  approaches  zero.  Further  refinement  was 
studied  by  Lai  [27],  [29],  who  set  a  time-varying  boundary 
value  Bk  ~  log((nCfc)-1).  Lai  showed  that  for  a  multivariate 
exponential  family  this  scheme  asymptotically  minimizes  both 
the  Bayes  risk  and  the  expected  sample  size  subject  to  error 
constraints  as  ck  approaches  zero  [29]. 

2)  Sequential  Adaptive  Likelihood  Ratio  Test  (SALRT):  We 
refer  to  sequential  tests  that  use  the  Adaptive  Likelihood  Ratio 
(ALR)  statistics  as  the  SALRT. 

For  i  =  0, 1,  let 


=  log  ,4i) 


be  the  ALR  statistics  used  to  reject  hypothesis  Hi  at  stage  n. 
Let 

N®  =  inf  {  n  :  L^’ALR(n)  >  B®' 


’}■ 


(42) 


be  the  stopping  rule  used  to  reject  hypothesis  Hi. 

For  each  component  k ,  the  IDS  stops  sampling  when  Nk  = 
min  | Nk^ ,  NA * | .  If  Nk  =  component  k  is  declared 

as  abnormal  (i.e.,  Hq  is  rejected).  If  Nk  =  component 

k  is  declared  as  normal  (i.e.,  IL,  is  accepted). 

The  SALRT  was  first  introduced  by  Robbins  and  Siegmund 
[25],  [26]  to  design  power-one  sequential  tests.  Pavlov  used  it 
to  design  asymptotically  (as  the  error  probability  approaches 
zero)  optimal  (in  terms  of  minimizing  the  expected  sample 
size  subject  to  error  constraints)  tests  for  composite  hypothesis 
testing  of  multivariate  exponential  family  [28].  Tartakovsky 
shows  asymptotically  optimal  performance  for  a  more  general 
multivariate  family  of  distributions  [30], 

The  advantage  of  using  the  SALRT  is  that  setting  Bky>  = 
log  =  log  satisfies  the  error  probability  constraints 

in  (3).  However,  such  simple  setting  can  not  be  applied  to 
the  SGLRT.  Thus,  implementing  the  SALRT  is  much  simpler 
than  implementing  the  SGLRT.  The  disadvantage  of  using 
the  SALRT  is  that  poor  early  estimates  (for  small  number 
of  observations)  can  never  be  revised  even  though  one  has 
a  large  number  of  observations.  Thus,  generally,  the  SGLRT 
outperforms  the  SALRT  in  terms  of  minimizing  the  expected 
sample  size  for  given  type  I  and  type  II  errors. 
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TABLE  III 

Algorithm  3  for  the  independent  model  under  uncertainty 


1.  arrange  the  components  in  decreasing 

order  of  nk(t1)wk/E(Nk) 

2.  for  k  =  1,  ...,K  components  do: 

3.  perform  SALRT/SGLRT  for  component  k, 

with  P£A  <  ak,  P™D  <  /3k 

4.  end  for 


TABLE  IV 

Algorithm  4  for  the  exclusive  model  under  uncertainty 


1.  arrange  the  components  in  decreasing 

order  of  7rk(t1)wk/E(Nk\H0) 

2.  for  k  :  1 . ....  K  components  do: 

3.  perform  SALRT/SGLRT  for  component  k, 

with  P£A  <  ak,  P™D  <  j3k 

4.  end  for 


B.  The  Proposed  Solutions  for  the  Independent  and  Exclusive 
Models 

In  this  section  we  modify  Algorithms  1,2,  given  in  Tables 
I,  II  to  take  into  account  the  uncertainty  in  the  model  of  the 
adversary.  Based  on  the  solution  to  the  two-stage  optimiza¬ 
tion  problem,  we  propose  Algorithm  3  and  4  to  solve  (2) 
for  the  independent  and  exclusive  models  under  uncertainty, 
respectively.  The  algorithms  are  presented  in  Tables  III,  IV. 
The  required  modification  is  in  step  3  of  both  algorithms. 
Under  uncertainty,  one  should  perform  SGLRT  or  SALRT, 
as  discussed  in  the  previous  section,  instead  of  the  SPRT. 

Remark  3:  Implementing  Algorithms  3, 4  requires  to  compute 
the  expected  sample  size  E(Nk\Hi)  for  all  k  =  1,2, ...,  K  for 
i  =  0, 1,  achieved  by  the  SGLRT  or  the  SALRT.  In  general,  it 
is  difficult  to  obtain  a  closed-form  expressions  to  the  exact 
value  of  E{Nk\Hi).  However,  we  can  use  the  asymptotic 
property  of  the  tests  to  obtain  a  closed-form  approximation 
to  E{Nk\Hi),  which  approaches  the  exact  expected  sample 
size  as  the  error  probability  approaches  zero. 

For  every  i  =  0, 1,  let 

(43) 

be  the  KL  divergence  between  the  real  value  of  0k  and  A, 
where  the  expectation  is  taken  with  respect  to  f(y\dk), 
and  let 

mOk\\Qf)=  inf  Dk(0k\\X)  .  (44) 


Let  pW(0fc)  be  a  prior  distribution  on  6k  under  hypothesis  I  f 
at  component  k.  Then,  as  PkA  — >  0,  P '^ID  — >  0,  the  expected 
sample  size  is  given  by: 


E(Nk\H0) 

E{Nk  |Fi) 


log 


Wef  D*k(6k ||©£x0 

/'  logSf 


iy -dP^m 


lohze™ui<»  D*k(0k ||©£0)) 


dP(1\Ok)  (45) 


logPj^ 


W40)  D*k{ek nejU) 


iy -dP{1\ek) 


where  I k°\lk1'>  are  disjoint  subsets  of  Ik  and  Ik  =  I^Ulj,1^. 


For  all  Oi  £  l',')  we  have 


logB^  ^ 


< 


log  B  W 

BiwieF) 


for 


i,j  =  0,1. 

At  each  time  t,  the  expected  sample  size  required  to  make  a 
decision  regarding  the  state  of  component  k  is  given  by: 

E(Nk)  =  Trk(t)E(Nk\Hi)  +  (1  -  nk(t))E(Nk\H0)  , 


(46) 

which  can  be  well  approximated  for  small  error  probability 
using  (45).  Remark  4:  In  numerous  cases,  uncertainty  is 
associated  with  abnormal  state  solely,  where  the  distribution 
under  normal  state  is  completely  known.  In  these  cases,  eval¬ 
uating  E(Nk)  to  implement  Algorithm  3  depends  on  the  prior 
distribution  on  9k  £  0\0j,°\  while  evaluating  E(Nk\Ho)  to 
implement  Algorithm  4  does  not. 


C.  Asymptotic  Optimality  of  Algorithms  3,4 

In  what  follows  we  show  that  Algorithms  3, 4  are  asymp¬ 
totically  optimal  in  terms  of  minimizing  the  objective  function 
subject  to  the  error  constraints  (2)  as  the  error  probability 
approaches  zero.  When  deriving  asymptotic  we  assume  that 
PkA  — >  0,  PjfD  — >  0  for  all  k  such  that  the  asymptotic 
optimality  property  in  terms  of  minimizing  the  expected  sam¬ 
ple  size  subject  to  the  error  constraints  holds  for  each  single 
process  for  both  SGLRT  and  SALRT,  as  discussed  in  Section 
VI- A4. 

Theorem  5:  Consider  the  independent  model  under  uncer¬ 
tainty.  Let  (t* ,  d* ,  <fi*)  be  the  optimal  solution  to  (2).  Let 
(t*43,  <5A3, 4>A3)  be  the  solution  achieved  by  Algorithm  3. 
Then,  as  PkA  — >  0,  Pjf10  — >  0  for  all  k,  we  obtain: 

eI  Y,  ^|(t-,^3,^3)} 

UeWl  J  (47) 

~eIy 

l keHl  J 


Proof:  For  every  k ,  let  ^*(^1^)  be  the  minimal  expected 

4  Asymptotic  optimality  for  a  single  process  is  guaranteed  for  an  exponential 
family  of  distributions  when  log  PfA  ~  logP^^  ~  \ogB~ 1  (which  is 
satisfied  by  setting  B  for  i  =  0, 1  for  some  positive  constants 

and  letting  B  approach  infinity)  under  some  weak  conditions  on  the 
parameter  distribution.  Nevertheless,  more  general  results  can  be  obtained 
in  some  cases.  For  more  details,  the  reader  is  referred  to  Section  VI-A  and 
references  therein. 
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sample  size  that  can  be  achieved  by  any  sequential  test,  such 
that  error  constraints  are  satisfied.  Let  EA3(Nk\Hi)  be  the 
expected  sample  size  achieved  by  Algorithm  3,  such  that  error 
constraints  are  satisfied.  Clearly,  E*(Nk\Hi)  <  EA3(Nk\Hi) 
for  all  fc,  for  i  =  0,1. 

Assume  that 

>  7T2(fi )w2  >  >  7 rK{ti)wK 

E*(N1)  ~  E*(N2 )  “  “  E*{Nk)  '  1  ; 

Similar  to  the  proof  of  Theorem  2,  it  can  be  verified  that  the 
optimal  solution  to  (2)  is  given  by  selecting  the  components 
by  the  following  order:  1, 2, ...,  K,  where  the  components  are 
tested  by  a  sequential  test  that  achieves  expected  sample  size 
E* (Nk\Hi)  for  all  k,  for  i  =  0,1.  Therefore,  the  expected 
weighted  sum  of  completion  times  achieved  by  (r* ,  <5* ,  <f>*) 
is  given  by: 


E|Xju,fcCfcl{fcew1}  I  (T*,<r,0*)j 


K 


=  ^2^k(h) 


Wk 


fc= 1 


tk-1 


YE*  (Nr)  +E *  {NklHi) 


v*= 1 


(49) 


By  the  asymptotic  optimality  property  of  the  SALRT/SGLRT 
for  a  single  process  (used  in  Algorithm  3),  it  follows  that 
EA3{Nk\Hi)  ~  E*(Nk\Hi)  for  all  k,  for  i  =  0, 1  as  PjfA  -> 
0,  PkID  — >  0.  As  a  result,  for  sufficiently  small  error  prob¬ 
abilities,  the  solution  (t*43,  <L43,  0‘43)  is  given  by  selecting 
the  components  by  the  following  order:  1,  2, ...,  K,  where  the 
components  are  tested  by  an  asymptotically  optimal  sequential 
test  that  achieves  expected  sample  size  EA3{Nk\Hi)  for  all 
for  i  =  0,1.  Therefore,  the  expected  weighted  sum  of 
completion  times  achieved  by  {tA3  ,5A3 ,4>A3)  is  given  by: 


K 


E  {  Y  wkCkl{keni}  |  (rA3,SA3,  cf>A3) 


f  fc= l 
K 


=  Y  nk(tl)wk 


fc= 1 


/k- 1 


YEA3m  +EA 3  ( Nk\H r) 


(50) 


Since  EA3(Nk\Hi)  ~  E*(Nk\Hi)  for  i  =  0, 1  as  PfcFA  -A 
0,  P^D  — >  0  for  all  k,  the  theorem  follows.  ■ 


Theorem  6:  Consider  the  exclusive  model  under  uncer¬ 
tainty.  Let  (t* ,  S* ,  4>*)  be  the  optimal  solution  to  (2).  Let 
(t*44,  <5  44,  0‘44)  be  the  solution  achieved  by  Algorithm  4. 
Then,  as  PkA  — >  0,  Pk*D  —>  0  for  all  k,  we  obtain: 


e{  Y  WkCk\(TA\8A\<t>Ai)\ 

UeHi  J 

~eIY  wkCk\(r*,6*,cl)*)\ 

UeWi  J 


(51) 


nk(ti)wk/E*(Nk\H0),  where  the  components  are  tested  by  a 
sequential  test  that  achieves  expected  sample  size  E* (Nk\Hi) 
for  all  k,  for  i  =  0, 1.  By  the  asymptotic  optimality  property 
for  a  single  process  of  the  SALRT/SGLRT  (used  in  Algorithm 
4),  it  follows  that  EAi{Nk\Hi)  ~  E*(Nk\Hi)  for  all  k,  for 
i  =  0, 1  as  PkA  — >  0,  P^D  — >  0.  As  a  result,  for  suffi¬ 
ciently  small  error  probabilities,  the  solution  (t*44,  <L44,  0‘44) 
is  given  by  selecting  the  components  in  decreasing  order  of 
nk(ti)wk/E* (Nk\Ho),  where  the  components  are  tested  by  an 
asymptotically  optimal  sequential  test  that  achieves  expected 
sample  size  EAA(Nk\Hi)  for  all  k,  for  i  =  0,1.  Similar  to 
the  proof  of  Theorem  5,  comparing  the  objective  functions 
achieved  by  and  (rA4,  <L44,  </>‘44)  proves  the 

theorem.  ■ 

VII.  Applications  and  Numerical  Examples 

In  this  section,  we  provide  applications  and  numerical  ex¬ 
amples  to  illustrate  the  performance  of  the  algorithms.  Assume 
that  an  intruder  tries  to  launch  a  Denial  of  Service  (DoS) 
or  Reduction  of  Quality  (RoQ)  attacks  by  sending  a  large 
number  of  packets  to  a  component  (which  can  be  a  relay 
node  in  this  application).  DoS  attacks  rely  on  overwhelming 
the  component  with  useless  traffic  that  constantly  exceeds  its 
capacity  so  to  make  it  unavailable  for  its  intended  use.  On 
the  other  hand,  RoQ  attacks  inflict  damage  on  the  component, 
while  keeping  a  low  profile  to  avoid  detection.  RoQ  attacks 
do  not  cause  denial  of  service. 

To  detect  such  attacks,  the  IDS  performs  a  traffic-based 
anomaly  detection.  It  monitors  the  traffic  at  each  component 
to  decide  whether  a  component  is  compromised.  Roughly 
speaking,  if  the  actual  arrival  rate  is  significantly  higher  than 
the  arrival  rate  under  normal  state,  then  the  IDS  should 
declare  that  the  component  is  in  an  abnormal  state.  Similar 
traffic-based  detection  techniques  were  proposed  in  [7],  [12] 
for  different  models,  considering  a  single  process  without 
switching  to  other  nodes. 

For  each  component  k,  we  assume  that  packets  arrive 
according  to  a  Poisson  process  with  rate  9 which  is 
generally  considered  to  be  a  good  model  in  a  queuing  theory 
analysis  [33].  When  component  k  is  tested,  the  IDS  collects 
an  observation  yk{n)  £  No  every  time  unit,  which  represents 
the  number  of  packets  that  arrived  in  the  interval  (n  —  1,  n). 
Assume  that  the  IDS  considers  component  k  as  normal  if 
®k  <  6^k  \  and  tests  9k  <  9 ^  against  9k  >  9^  (i.e., 
Ik  =  <  6^}  is  the  indifference  region). 

We  set  wk  =  9 j.0'1 .  Under  this  setting,  the  objective  function 
represents  the  total  expected  number  of  failed  packets  in  the 
network  during  DoS  attacks.  Thus,  the  optimization  problem 
can  be  observed  as  minimizing  the  maximal  damage  to  the 
network  in  terms  of  packet-loss.  Furthermore,  this  setting 
prioritizes  components  with  higher  normal  traffic  to  reduce 
the  delay  caused  by  RoQ  attacks. 


Proof:  The  structure  of  the  proof  is  similar  to  the  proof  of  The¬ 
orem  5.  Hence,  we  provide  a  sketch  of  the  proof,  using  similar 
notations  used  in  the  proof  of  Theorem  5.  Similar  to  the  proof 
of  Theorem  4,  it  can  be  verified  that  the  optimal  solution  to  (2) 
is  given  by  selecting  the  components  in  decreasing  order  of 


A.  Detection  Under  Simple  Hypotheses 

In  this  section,  we  consider  the  case  where  the  parameters 
Qk  —  under  normal  state  and  9k  =  9^ 1  under  abnormal 
state  are  known  to  the  IDS.  To  implement  Algorithms  1,  2 
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(which  are  optimal  in  this  scenario  for  the  independent  and 
exclusive  model,  respectively),  we  need  to  compute  the  LR 
(or  the  log-LR)  between  the  hypotheses,  defined  in  (4),  and 
the  expected  sample  sizes  under  the  hypotheses,  which  can  be 
well  approximated  by  (9). 

Let 

Afc(n)  =  logLfc(n)  (52) 

be  the  Log-Likelihood  Ratio  (LLR)  between  the  two  hypothe¬ 
ses  of  component  k  at  stage  n,  where  Lk(n)  is  defined  in  (4). 
After  algebraic  manipulations,  it  can  verified  that  the  LLR  is 
given  by: 

n 

Afc(n)  =  -n  (d{k1]  -  6^0))+log  (of/O^  Vk' W  •  (53) 

i=l 

It  can  be  verified  that  the  KL  divergence  between  the  hypothe¬ 
ses  Hi  and  Hj ,  defined  in  (8),  is  given  by: 

Dk  (*| I j)  =  e[J)  -  of  +  ef  iog  (ef /of)  .  (54) 

Substituting  (54)  in  (9)  obtains  the  required  approximation  to 
the  expected  sample  size. 

Next,  we  provide  numerical  example  to  illustrate  the  per¬ 
formance  of  the  algorithms.  We  compared  three  schemes:  a 
Random  selection  SPRT  (R-SPRT),  where  a  series  of  SPRTs 
are  performed  until  all  the  components  are  tested  in  a  random 
order,  and  the  proposed  Algorithms  1,  2,  which  are  optimal 
for  the  independent  and  exclusive  models,  respectively. 

Let  SK  =  (100  -  10) /{K  -  1).  We  set  wk  =  9{f  =  10  + 
(k  —  1)<5a'  and  of  =  1.5  •  Of.  The  error  constraints  were  set 
to  PkA  =  PkID  =  10~2  for  all  k.  For  the  independent  and 
exclusive  models,  we  set  irk  =  0.8  and  nk  =  1/K  for  all  k , 
respectively.  The  performance  of  Algorithm  1  and  Algorithm  2 
are  presented  in  Fig.  1(a)  and  1(b)  under  the  independent  and 
exclusive  models,  respectively,  as  compared  to  the  R-SPRT.  It 
can  be  seen  that  the  proposed  Algorithms  save  roughly  50% 
of  the  objective  value  as  compared  to  the  R-SPRT  under  both 
the  independent  and  exclusive  model  scenarios. 

B.  Detection  Under  Uncertainty 

In  this  section,  we  consider  the  case  of  composite  hypothe¬ 
ses,  where  there  is  uncertainty  in  the  distribution  parameters 
(in  particular  when  the  component  is  in  an  abnormal  state),  as 
discussed  in  Section  VI.  To  implement  Algorithms  3, 4  (which 
are  asymptotically  optimal  in  this  scenario  for  the  independent 
and  exclusive  model,  respectively),  we  need  to  compute  the 
GLR  or  ALR  statistics  between  the  hypotheses,  defined  in 
(39),  (41)  and  the  expected  sample  sizes  under  the  hypotheses, 
which  can  be  well  approximated  by  (45).  The  MLEs  of  the 
parameters  over  the  parameter  space  0  k ,  P)/ 1  are  given  by  the 
sample  mean  and  the  boundary  of  the  alternative  parameter 
space,  respectively.  As  a  result,  substituting: 

0k{n)  =  ^UVk{i), 

A(i)  /  \  n(‘)  ^  ^ 

0fc  (»)  =  °k  > 

in  (39),  (41)  yields  the  GLR  and  ALR  statistics,  respectively. 
The  KL  divergence  between  the  real  value  of  9k  and  the 


(a)  An  independent  model  scenario. 


Fig.  1.  Objective  value  as  a  function  of  the  number  of  components  under 
the  independent  and  exclusive  models. 


(i) 

parameter  space  (-)),  is  given  by: 

=  Of  -0k+  9k  log  ( 9k/9f )  .  (56) 

Substituting  (56)  in  (45)  yields  the  approximate  expected 
sample  size. 

Next,  we  provide  numerical  example  to  illustrate  the  per¬ 
formance  of  the  algorithms  under  uncertainty.  We  simulated 
a  network  with  homogenous  components  (i.e.,  any  selection 
rule  is  optimal).  We  compared  three  schemes:  R-SPRT,  and 
Algorithms  3  or  4  (which  achieve  the  same  performance  in  this 
case)  using  the  SALRT  and  the  SGLRT,  discussed  in  section 
VI-A.  We  set  9f  =  19,  9f  =  21.  Under  uncertainty,  the 
IDS  considers  component  k  as  normal  if  9k  <  of\  and  tests 
9k  <  9 j,0)  against  9k  >  Of  (i.e.,  Ik  =  {9k 1 19  <  9k  <  21}  is 
the  indifference  region).  To  implement  the  SGLRT,  we  set  the 
cost  per  observation  c  =  10~3.  According  to  the  assigned  cost, 
we  obtained  the  following  error  probability  constraints  for  all 
k:  P[A  <  0.026  for  all  9 ^  <  19  and  P™D  <  0.03  for  all 
9(k>  >  21.  We  do  not  restrict  the  detector’s  performance  for 
19  <  9 <  21  (Note  that  narrowing  the  indifference  region 
has  the  price  of  increasing  the  sample  size).  In  Fig.  2  we  show 
the  average  number  of  observations  required  for  detection  as  a 
function  of  0(k\  As  expected,  for  9k  =  19  and  9k  =  21  the  R- 
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(a)  Average  number  of  observations  as  a  function  of  6 


(b)  Average  number  of  observations  as  a  function  of  6 


Fig.  2.  Average  number  of  observations  as  a  function  of  the  arrival  rate  of 
packets  (denoted  by  0). 


SPRT  requires  lower  sample  size  as  compared  to  the  proposed 
schemes.  On  the  other  hand,  it  can  be  seen  that  for  most  values 
of  9  the  SGLRT  and  the  SALRT  require  lower  sample  size  as 
compared  to  the  R-SPRT.  The  SALRT  performs  the  worst  for 
18  <  9k  <  22,  and  performs  the  best  for  0^(18,  22),  roughly. 
The  SGLRT  obtains  the  best  average  performance.  It  can  be 
seen  that  for  large  values  of  9k  the  anomaly  is  detected  very 
quickly,  since  the  distance  between  the  hypotheses  increases. 
This  result  confirms  that  DoS  attacks  are  much  easier  to  detect 
as  compared  to  RoQ  attacks. 

VIII.  Conclusion 

The  problem  of  quickest  localization  of  anomaly  in  a 
resource-constrained  cyber  network  was  investigated.  Due  to 
resource  constraints,  only  one  component  can  be  probed  at 
each  time.  The  observations  are  random  realizations  drawn 
from  two  different  distributions  depending  on  whether  the 
component  is  normal  or  anomalous.  The  problem  was  for¬ 
mulated  as  a  priority-based  constrained  optimization  prob¬ 
lem.  Components  with  higher  priorities  in  an  abnormal  state 
should  be  fixed  before  components  with  lower  priorities  to 
reduce  the  overall  damage  to  the  network.  The  objective 
is  to  minimize  the  expected  weighted  sum  of  completion 
times  subject  to  error  probability  constraints.  We  considered 


two  different  anomaly  models:  the  independent  model  in 
which  each  component  can  be  abnormal  independent  of  other 
components,  and  the  exclusive  model  in  which  there  is  one 
and  only  one  abnormal  component.  For  the  simple  hypotheses 
case,  we  derived  optimal  algorithms  for  both  independent 
and  exclusive  models.  For  the  composite  hypotheses  case, 
we  derived  asymptotically  (as  the  error  probability  approaches 
zero)  optimal  algorithms  for  both  independent  and  exclusive 
models.  These  optimal  algorithms  have  low-complexity. 

The  algorithms  developed  throughout  this  paper  can  be 
applied  to  other  models  of  anomaly  detection  as  well.  We  can 
modify  the  proposed  algorithms  to  any  detection  scheme  that 
performs  a  series  of  tests  until  all  the  components  are  tested. 
The  required  modification  is  in  step  3  of  the  algorithms,  where 
the  SPRT/SALRT/SGLRT  are  replaced  by  any  given  test.  As  a 
result,  the  modified  algorithms  minimize  the  objective  function 
among  all  the  algorithms  that  perform  the  given  test. 
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